1. Field of the Invention
The present invention relates to a method and apparatus for highlighting and categorizing documents and, more particularly, to an automatic method for highlighting words of a document relating to a specific topic of the document using word tokens and for categorizing the document into a pre-existing topical category.
2. Description of Related Art
Techniques for converting scanned image data into text data suitable for use in a digital computer are well known.
However, it has heretofore not been possible to use such techniques or systems to automatically highlight or otherwise "mark up" key words or phrases of a document. Nor has it been possible to automatically categorize a document into specific topic categories. Rather, as each document is provided to the system, some key words corresponding to specific topics of the document must be provided to the system. This has been accomplished by having the operator input data concerning key words of the document to the system.